Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search
نویسندگان
چکیده
Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph. We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues.
منابع مشابه
Glycan Pattern Search
Glycans are branched tree-like molecules composed by building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context RDF is a possible software solution for storing structures and SPARQL can be direc...
متن کاملJena Property Table Implementation
A common approach to providing persistent storage for RDF is to store statements in a three-column table in a relational database system. This is commonly referred to as a triple store. Each table row represents one RDF statement. For RDF graphs with frequent patterns, an alternative storage scheme is a property table. A property table comprises one column containing a statement subject plus on...
متن کاملSupporting Scalable, Persistent Semantic Web Applications
To realize the vision of the Semantic Web, efficient storage and retrieval of large RDF data sets is required. A common technique for persisting RDF data (graphs) is to use a single relational database table, a triple store. But, we believe a single triple store cannot scale for large-scale applications. This paper describes storing and querying persistent RDF graphs in Jena, a Semantic Web pro...
متن کاملIncremental characterization of RDF Triple Stores
Many semantic web applications integrate data from distributed triple stores and to be efficient, they need to know what kind of content each triple store holds in order to assess if it can contribute to its queries. We present an algorithm to build indexes summarizing the content of triple stores. We extended Depth-First Search coding to provide a canonical representation of RDF graphs and we ...
متن کاملA Scale-Out RDF Molecule Store for Improved Co-Identification, Querying and Inferencing
Semantic inferencing and querying across large scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employing Google’s MapReduce framework to implement scale-out distributed querying and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal appr...
متن کامل